California Covid-19 Analytics

2023-11-17

Learn About The DataSet

In this presentation I’ll be doing a data analysis from data retrieved from https://data.ca.gov/dataset under the File COVID-19 Vaccine Progress Dashboard Data and here is the link to that file –> https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data.

The data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/. The data has been updated at least 3 times, with the most recent one from March 3 2023, this will be the current data I’m drawing from.

Objective

The data summarizes vaccination data at the county level by county of residence within the state of California. With this data set, I’m aiming to create two different plots that provide insights into COVID-19 hospitalization data.

The objective of the code is to filter the dataset, ensuring only relevant and complete information is used for visualization. Subsequently, I am aiming to generate accurate visual representations, empowering a deeper understanding of the COVID-19 hospitalization landscape across Californian counties. These visuals provide varying perspectives: the count of confirmed hospitalized patients per county and a composite index merging different factors, enabling comprehensive insights into the severity of hospitalizations.

This is a simple data exploration, we are just exploring trends in cases of COVID-19.

loading files

library(readr)  
library(dplyr)  
library(ggplot2)

Here we are doing the simple task of loading we need for plotting and getting our dataset to work and analyize.

statewide_covid_19_hospital_county_data <- read_csv("~/Desktop/covid-19-hospital-data/statewide-covid-19-hospital-county-data.csv")

Here we are loading the directoory of our actual dataset.

View Our Current Summary

cat(’

’)

summary(statewide_covid_19_hospital_county_data)
##     county          todays_date        hospitalized_covid_confirmed_patients
##  Length:70927       Length:70927       Min.   :   0.00                      
##  Class :character   Class :character   1st Qu.:   1.00                      
##  Mode  :character   Mode  :character   Median :  11.00                      
##                                        Mean   :  73.34                      
##                                        3rd Qu.:  52.00                      
##                                        Max.   :8098.00                      
##                                        NA's   :8                            
##  hospitalized_suspected_covid_patients hospitalized_covid_patients
##  Min.   :-241.000                      Min.   :   0.00            
##  1st Qu.:   0.000                      1st Qu.:   1.00            
##  Median :   0.000                      Median :  12.00            
##  Mean   :   9.492                      Mean   :  82.62            
##  3rd Qu.:   8.000                      3rd Qu.:  63.00            
##  Max.   :1358.000                      Max.   :8422.00            
##  NA's   :9                             NA's   :1267               
##  all_hospital_beds icu_covid_confirmed_patients icu_suspected_covid_patients
##  Min.   :    0     Min.   :   0.00              Min.   : -2.000             
##  1st Qu.:   55     1st Qu.:   0.00              1st Qu.:  0.000             
##  Median :  278     Median :   2.00              Median :  0.000             
##  Mean   : 1181     Mean   :  16.02              Mean   :  1.444             
##  3rd Qu.: 1023     3rd Qu.:  10.00              3rd Qu.:  1.000             
##  Max.   :54966     Max.   :1731.00              Max.   :262.000             
##  NA's   :3602      NA's   :30                   NA's   :7235                
##  icu_available_beds
##  Min.   :-110.00   
##  1st Qu.:   2.00   
##  Median :   8.00   
##  Mean   :  41.21   
##  3rd Qu.:  29.00   
##  Max.   :1502.00   
##  NA's   :802
cat(’

’)

Data Cleaning and Preprocessing

# Check for missing values
missing_values <- any(is.na(statewide_covid_19_hospital_county_data))

# If there are missing values, handle them appropriately
clean_data <- na.omit(statewide_covid_19_hospital_county_data)

# Store the information 
result_text <- paste("Any missing values in the dataset:", missing_values)

Summary of Our Clean Data

cat(’

’)

county todays_date hospitalized_covid_confirmed_patients hospitalized_suspected_covid_patients hospitalized_covid_patients all_hospital_beds icu_covid_confirmed_patients icu_suspected_covid_patients icu_available_beds
Shasta 4/20/20 1 0 0 226 1 0 18
Riverside 4/20/20 247 147 9 164 76 29 134
San Diego 4/20/20 262 98 46 715 103 19 169
San Francisco 4/21/20 76 45 121 738 25 9 79
Santa Clara 4/21/20 144 34 178 2582 70 16 120
Kings 4/21/20 8 0 8 223 3 0 6
cat(’

’)

Aggregate Data by County:

library(dplyr)
county_summary <- clean_data %>%
                   group_by(county) %>%
                   summarize(
                     avg_hospitalized_confirmed = mean(hospitalized_covid_confirmed_patients),
                     max_icu_suspected = max(icu_suspected_covid_patients),
                     total_hospital_beds = sum(all_hospital_beds)
                   )

This code helps us cover crucial summary metrics for each county in our dataset. By grouping the data based on counties, it allows us to compute key indicators like the average count of hospitalized COVID-19 confirmed patients, the maximum count of suspected COVID-19 patients in the ICU, and the total available hospital beds. These statistics offer valuable insights into how hospital resources are utilized across different regions concerning COVID-19 hospitalizations

Data Visulation of Data by County Walk-Through

This code utilizes ggplot2 to create a bar plot visualizing the average count of hospitalized confirmed COVID-19 patients across different counties in California.This visualization is beneficial for our analysis as it provides a clear comparison of the average count of hospitalized confirmed patients across various counties.

library(ggplot2)
ggplot(county_summary, aes(x = county, y = avg_hospitalized_confirmed)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Average Hospitalized Confirmed Patients by County")

Data Visulation of Data by County

About our Data Visulization and next steps

This code creates a visual representation of the average count of confirmed COVID-19 patients requiring hospitalization across different counties. It uses bars to display this information, with each bar representing a county. The height of each bar corresponds to the average count of hospitalized confirmed patients in that particular county. This graph helps us easily compare the average patient count between different areas, allowing us to identify variations in COVID-19 hospitalizations across counties.

Now that we’ve cleaned our data and successful got an accurate depiction of Covid cases per County, we’re going to want to includeboth hospitalized patients and ICU bed availability to improve our variability and get better understanding of how COVID-19 cases are affecting CA as a whole.

3D plot visualizing confirmed COVID-19 cases, suspected cases, and hospitalized patients

3D plot visual analysis

In this 3D scatter plot, we visualize confirmed COVID-19 cases, suspected cases, and hospitalized patients as the x, y, and z-axes, respectively. However, the plot exhibits a broad distribution of data, making it challenging to discern distinct patterns. When cluster points in a 3D scatter plot seem disorganized or scattered in multiple directions, it could indicate disparate scales or ranges among the variables.

This dispersion might hinder a clear interpretation of relationships between the variables. To enhance clarity, considering normalizing the variables to ensure consistent scales across axes.

COVID-19 Patient Hospitalization Patterns plot

COVID-19 Patient Hospitalization Patterns

The normalized 3D scatter plot seems to identify a positive correlation between the number of hospitalized COVID-19 confirmed patients and the total hospitalized COVID-19 patients. The clearer and less cluttered visualization allows for a more apparent observation of this relationship.

Positive Correlation: As the count of hospitalized COVID-19 confirmed patients increases, there appears to be a simultaneous increase in the count of total hospitalized COVID-19 patients.

This observation aligns with the trend showcased in the normalized 3D scatter plot, indicating that as the count of confirmed cases rises, there’s a proportional increase in the number of individuals requiring hospitalization due to COVID-19.

A Simple stat ratio

This simple ratio plot provides a simple representation of the ratio between hospitalized confirmed patients and available hospital beds

Using the composite index

This plot here looks very similar, but it offers a more nuanced view by incorporating multiple factors (hospitalized patients and ICU bed availability) in a weighted index, potentially providing a more comprehensive insight into the situation

Conclusion

Through the exploration of COVID-19 hospitalization data across Californian counties, several vital observations have surfaced:

The analysis reveals substantial variations in hospitalization counts among counties, with Los Angeles being the highest. Some areas exhibit higher average counts of hospitalized confirmed COVID-19 patients, indicating potential hotspots requiring closer attention and resource allocation.

In summary, this analysis offers crucial insights into the COVID-19 hospitalization landscape in California. For example we analyzed data from Los Angeles, Riverside, Santa Clara, and Orange County, and many more! the correlation of data we got for California we found can now be used to make more informed decision-making and create targeted interventions in order to combat the challenges posed by the pandemic.